A Comparison of One-Rate and Two-Rate Inference Frameworks for Site-Specific dN/dS Estimation.
نویسندگان
چکیده
Two broad paradigms exist for inferring [Formula: see text] the ratio of nonsynonymous to synonymous substitution rates, from coding sequences: (i) a one-rate approach, where [Formula: see text] is represented with a single parameter, or (ii) a two-rate approach, where [Formula: see text] and [Formula: see text] are estimated separately. The performances of these two approaches have been well studied in the specific context of proper model specification, i.e., when the inference model matches the simulation model. By contrast, the relative performances of one-rate vs. two-rate parameterizations when applied to data generated according to a different mechanism remain unclear. Here, we compare the relative merits of one-rate and two-rate approaches in the specific context of model misspecification by simulating alignments with mutation-selection models rather than with [Formula: see text]-based models. We find that one-rate frameworks generally infer more accurate [Formula: see text] point estimates, even when [Formula: see text] varies among sites. In other words, modeling [Formula: see text] variation may substantially reduce accuracy of [Formula: see text] point estimates. These results appear to depend on the selective constraint operating at a given site. For sites under strong purifying selection ([Formula: see text]), one-rate and two-rate models show comparable performances. However, one-rate models significantly outperform two-rate models for sites under moderate-to-weak purifying selection. We attribute this distinction to the fact that, for these more quickly evolving sites, a given substitution is more likely to be nonsynonymous than synonymous. The data will therefore be relatively enriched for nonsynonymous changes, and modeling [Formula: see text] contributes excessive noise to [Formula: see text] estimates. We additionally find that high levels of divergence among sequences, rather than the number of sequences in the alignment, are more critical for obtaining precise point estimates.
منابع مشابه
One-rate models outperform two-rate models in site-specific dN/dS estimation
Methods that infer site-specific dN/dS, the ratio of nonsynonymous to synonymous substitution rates, from coding data have been developed primarily to identify positively selected sites (dN/dS > 1). As a consequence, it is largely unknown how well different inference methods can infer dN/dS point estimates at individual sites. In particular, dN/dS may be estimated using either a one-rate approa...
متن کاملCalculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates
Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known...
متن کاملInference of Markov Chain: AReview on Model Comparison, Bayesian Estimation and Rate of Entropy
This article has no abstract.
متن کاملADAPTIVE NEURO FUZZY INFERENCE SYSTEM BASED ON FUZZY C–MEANS CLUSTERING ALGORITHM, A TECHNIQUE FOR ESTIMATION OF TBM PENETRATION RATE
The tunnel boring machine (TBM) penetration rate estimation is one of the crucial and complex tasks encountered frequently to excavate the mechanical tunnels. Estimating the machine penetration rate may reduce the risks related to high capital costs typical for excavation operation. Thus establishing a relationship between rock properties and TBM pe...
متن کاملInference on Pr(X > Y ) Based on Record Values From the Power Hazard Rate Distribution
In this article, we consider the problem of estimating the stress-strength reliability $Pr (X > Y)$ based on upper record values when $X$ and $Y$ are two independent but not identically distributed random variables from the power hazard rate distribution with common scale parameter $k$. When the parameter $k$ is known, the maximum likelihood estimator (MLE), the approximate Bayes estimator and ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Genetics
دوره 204 2 شماره
صفحات -
تاریخ انتشار 2016